AITopics | viterbi training

Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

Neural Information Processing SystemsMar-15-2024, 10:01:08 GMT

We present an asymptotic analysis of Viterbi Training (VT) and contrast it with a more conventional Maximum Likelihood (ML) approach to parameter estimation in Hidden Markov Models. While ML estimator works by (locally) maximizing the likelihood of the observed data, VT seeks to maximize the probability of the most likely hidden state sequence. We develop an analytical framework based on a generating function formalism and illustrate it on an exactly solvable model of HMM with one unambiguous symbol. For this particular model the ML objective function is continuously degenerate. VT objective, in contrast, is shown to have only finite degeneracy.

markov process, objective function, probability, (15 more...)

Neural Information Processing Systems

Country:

Asia > Armenia > Yerevan > Yerevan (0.04)
North America > United States > New Jersey (0.04)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

End-to-End Training of a Neural HMM with Label and Transition Probabilities

Mann, Daniel, Raissi, Tina, Michel, Wilfried, Schlüter, Ralf, Ney, Hermann

arXiv.org Artificial IntelligenceOct-9-2023

We investigate a novel modeling approach for end-to-end neural network training using hidden Markov models (HMM) where the transition probabilities between hidden states are modeled and learned explicitly. Most contemporary sequence-to-sequence models allow for from-scratch training by summing over all possible label segmentations in a given topology. In our approach there are explicit, learnable probabilities for transitions between segments as opposed to a blank label that implicitly encodes duration statistics. We implement a GPU-based forward-backward algorithm that enables the simultaneous training of label and transition probabilities. We investigate recognition results and additionally Viterbi alignments of our models. We find that while the transition model training does not improve recognition performance, it has a positive impact on the alignment quality. The generated alignments are shown to be viable targets in state-of-the-art Viterbi trainings.

alignment, probability, transition model, (12 more...)

arXiv.org Artificial Intelligence

2310.02724

Country:

Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

Neural Information Processing SystemsApr-6-2023, 13:07:04 GMT

We present an asymptotic analysis of Viterbi Training (VT) and contrast it with a more conventional Maximum Likelihood (ML) approach to parameter estimation in Hidden Markov Models. While ML estimator works by (locally) maximizing the likelihood of the observed data, VT seeks to maximize the probability of the most likely hidden state sequence. We develop an analytical framework based on a generating function formalism and illustrate it on an exactly solvable model of HMM with one unambiguous symbol. For this particular model the ML objective function is continuously degenerate. VT objective, in contrast, is shown to have only finite degeneracy.

Add feedback

Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

Allahverdyan, Armen, Galstyan, Aram

Neural Information Processing SystemsFeb-14-2020, 23:12:18 GMT

We present an asymptotic analysis of Viterbi Training (VT) and contrast it with a more conventional Maximum Likelihood (ML) approach to parameter estimation in Hidden Markov Models. While ML estimator works by (locally) maximizing the likelihood of the observed data, VT seeks to maximize the probability of the most likely hidden state sequence. We develop an analytical framework based on a generating function formalism and illustrate it on an exactly solvable model of HMM with one unambiguous symbol. For this particular model the ML objective function is continuously degenerate. VT objective, in contrast, is shown to have only finite degeneracy.

Add feedback

Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

Allahverdyan, Armen E., Galstyan, Aram

arXiv.org Machine LearningDec-16-2013

We present an asymptotic analysis of Viterbi Training (VT) and contrast it with a more conventional Maximum Likelihood (ML) approach to parameter estimation in Hidden Markov Models. While ML estimator works by (locally) maximizing the likelihood of the observed data, VT seeks to maximize the probability of the most likely hidden state sequence. We develop an analytical framework based on a generating function formalism and illustrate it on an exactly solvable model of HMM with one unambiguous symbol. For this particular model the ML objective function is continuously degenerate. VT objective, in contrast, is shown to have only finite degeneracy. Furthermore, VT converges faster and results in sparser (simpler) models, thus realizing an automatic Occam's razor for HMM learning. For more general scenario VT can be worse compared to ML but still capable of correctly recovering most of the parameters.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

1312.4551

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Viterbi training in PRISM

Sato, Taisuke, Kubota, Keiichi

arXiv.org Artificial IntelligenceNov-28-2013

VT (Viterbi training), or hard EM, is an efficient way of parameter learning for probabilistic models with hidden variables. Given an observation $y$, it searches for a state of hidden variables $x$ that maximizes $p(x,y \mid \theta)$ by coordinate ascent on parameters $\theta$ and $x$. In this paper we introduce VT to PRISM, a logic-based probabilistic modeling system for generative models. VT improves PRISM in three ways. First VT in PRISM converges faster than EM in PRISM due to the VT's termination condition. Second, parameters learned by VT often show good prediction performance compared to those learned by EM. We conducted two parsing experiments with probabilistic grammars while learning parameters by a variety of inference methods, i.e.\ VT, EM, MAP and VB. The result is that VT achieved the best parsing accuracy among them in both experiments. Also we conducted a similar experiment for classification tasks where a hidden variable is not a prediction target unlike probabilistic grammars. We found that in such a case VT does not necessarily yield superior performance. Third since VT always deals with a single probability of a single explanation, Viterbi explanation, the exclusiveness condition that is imposed on PRISM programs is no more required if we learn parameters by VT. Last but not least we can say that as VT in PRISM is general and applicable to any PRISM program, it largely reduces the need for the user to develop a specific VT algorithm for a specific model. Furthermore since VT in PRISM can be used just by setting a PRISM flag appropriately, it makes VT easily accessible to (probabilistic) logic programmers. To appear in Theory and Practice of Logic Programming (TPLP).

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1017/S1471068413000677

1303.5659

Country:

South America > Paraguay > Asunción > Asunción (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(2 more...)

Add feedback

Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

Allahverdyan, Armen, Galstyan, Aram

Neural Information Processing SystemsDec-31-2011

We present an asymptotic analysis of Viterbi Training (VT) and contrast it with a more conventional Maximum Likelihood (ML) approach to parameter estimation in Hidden Markov Models. While ML estimator works by (locally) maximizing the likelihood of the observed data, VT seeks to maximize the probability of the most likely hidden state sequence. We develop an analytical framework based on a generating function formalism and illustrate it on an exactly solvable model of HMM with one unambiguous symbol. For this particular model the ML objective function is continuously degenerate. VT objective, in contrast, is shown to have only finite degeneracy. Furthermore, VT converges faster and results in sparser (simpler) models, thus realizing an automatic Occam's razor for HMM learning. For more general scenario VT can be worse compared to ML but still capable of correctly recovering most of the parameters.

artificial intelligence, bayesian inference, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

A constructive proof of the existence of Viterbi processes

Lember, J., Koloydenko, A.

arXiv.org Machine LearningApr-14-2008

Since the early days of digital communication, hidden Markov models (HMMs) have now been also routinely used in speech recognition, processing of natural languages, images, and in bioinformatics. In an HMM $(X_i,Y_i)_{i\ge 1}$, observations $X_1,X_2,...$ are assumed to be conditionally independent given an ``explanatory'' Markov process $Y_1,Y_2,...$, which itself is not observed; moreover, the conditional distribution of $X_i$ depends solely on $Y_i$. Central to the theory and applications of HMM is the Viterbi algorithm to find {\em a maximum a posteriori} (MAP) estimate $q_{1:n}=(q_1,q_2,...,q_n)$ of $Y_{1:n}$ given observed data $x_{1:n}$. Maximum {\em a posteriori} paths are also known as Viterbi paths or alignments. Recently, attempts have been made to study the behavior of Viterbi alignments when $n\to \infty$. Thus, it has been shown that in some special cases a well-defined limiting Viterbi alignment exists. While innovative, these attempts have relied on rather strong assumptions and involved proofs which are existential. This work proves the existence of infinite Viterbi alignments in a more constructive manner and for a very general class of HMMs.

alignment, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

doi: 10.1109/TIT.2010.2040897

0804.2138

Country:

North America > United States (0.93)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Filters

Collaborating Authors

viterbi training

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

End-to-End Training of a Neural HMM with Label and Transition Probabilities

Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

Viterbi training in PRISM

Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

A constructive proof of the existence of Viterbi processes